Alert directives and focused alert directives in a behavioral recognition system

ABSTRACT

Alert directives and focused alert directives allow a user to provide feedback to a behavioral recognition system to always or never publish an alert for certain events. Such an approach bypasses the normal publication methods of the behavioral recognition system yet does not obstruct the system&#39;s learning procedures.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of co-pending U.S. patent applicationSer. No. 13/839,587, filed on Mar. 15, 2013, which in turn claimspriority to and benefit of U.S. Provisional Application Ser. No.61/611,284, filed on Mar. 15, 2012; the entire contents of eachaforementioned application are herein expressly incorporated byreference for all purposes.

FIELD

Embodiments of the present invention generally relate to configuring abehavioral recognition-based video surveillance system to generatealerts for certain events. More specifically, the embodiments providetechniques allowing a behavioral recognition system to identify eventsthat should always or never result in an alert without impeding theunsupervised learning process of the surveillance system.

DESCRIPTION OF THE RELATED ART

Some currently available video surveillance systems provide simpleobject recognition capabilities. For example, a video surveillancesystem may be configured to classify a group of pixels (referred to as a“blob”) in a given frame as being a particular object (e.g., a person orvehicle). Once identified, a “blob” may be tracked from frame-to-framein order to follow the “blob” moving through the scene over time, e.g.,a person walking across the field of vision of a video surveillancecamera. Further, such systems may be configured to determine when anobject has engaged in certain predefined behaviors. For example, thesystem may include definitions used to recognize the occurrence of anumber of predefined events, e.g., the system may evaluate theappearance of an object classified as depicting a car (a vehicle-appearevent) coming to a stop over a number of frames (a vehicle-stop event).Thereafter, a new foreground object may appear and be classified as aperson (a person-appear event) and the person then walks out of frame (aperson-disappear event). Further, the system may be able to recognizethe combination of the first two events as a “parking-event.” Suchsurveillance systems typically require that the objects and/or behaviorswhich may be recognized by the system be defined in advance. Thus, inpractice, these systems rely on predefined definitions for objectsand/or behaviors to evaluate a video sequence. More generally, suchsystems rely on predefined rules and static patterns and are thus oftenunable to dynamically identify objects, events, behaviors, or patterns,much less even classify them as either normal or anomalous.

On the other hand, a behavioral recognition system is a type of videosurveillance system that may be configured to learn, identify, andrecognize patterns of behavior by observing a sequence of individualframes, otherwise known as a video stream. Unlike rules-based videosurveillance systems, a behavioral recognition system instead learnsobjects and behavioral patterns by generalizing video input and buildingmemories of what is observed. Over time, a behavioral recognition systemuses these memories to distinguish between normal and anomalous behaviorcaptured in the field of view of a video stream. Upon detectinganomalous behavior, the behavioral recognition system publishes an alertto a user notifying the user of the behavior. After several recurrencesof a particular event, the behavioral recognition system learns that theevent is non-anomalous and ceases publishing subsequent alerts. Forexample, a behavioral recognition system focused on a building corridormay initially publish alerts each time an individual appears in thecorridor at a certain time of day within the field of view of thecamera. If this event occurs a sufficient amount of times, thebehavioral recognition system may learn that this is non-anomalousbehavior and stop alerting a user to this event.

However, although in a plurality of cases this is how a user expectssuch a system to work, in some instances, the user may want thebehavioral recognition system to always publish an alert for aparticular behavioral event. Returning to the previous example, if thecorridor were of limited access, security personnel may want to benotified each time someone appears in the corridor to ascertain thatonly people in the corridor are the ones authorized to be there.Conversely, the user may not ever want the behavioral recognition systemto publish an alert for a particular behavior. This situation may arisewhere the event occurs often but infrequently enough to result in analert. For example, a behavioral recognition system focused on a room ina building that is next to a construction site may create alertswhenever construction vehicles pass through the field of view of thecamera outside a window in the room. In this instance, securitypersonnel may not want the behavioral recognition system to ever alerton these occurrences.

Behavioral recognition systems by their very nature avoid the use ofpredefined rules wherever possible in favor of unsupervised learning.Thus, approaching a solution for these issues requires a natural methodfor providing feedback to a behavioral recognition system regarding whatbehaviors should the system either always or never result in an alert.

SUMMARY

One embodiment of the invention provides a method for alerting a user tobehavior corresponding to an alert directive. This method may generallyinclude obtaining characteristic values from an observed event in ascene. This method may also include parsing a list of alert directivesfor a matching alert directive having ranges of criteria values. If thecharacteristic values are within the ranges of the criteria values, thenthe observed event corresponds to a matching alert directive. Thismethod may also include upon identifying the matching alert directive,alerting the user to the observed event.

Additionally, the characteristic values of the observed event may be apixel-height value, a pixel-width value, and an x- and y-coordinatecenter position of a foreground object. The characteristic values mayalso be a set of x- and y-coordinates corresponding to a trajectory of aforeground object. Further, the matching alert directive may have afocus mask that intersects with a region in the scene where the observedevent occurred.

Other embodiments include, without limitation, a computer-readablemedium that includes instructions that enable a processing unit toimplement one or more aspects of the disclosed methods as well as asystem having a processor, memory, and application programs configuredto implement one or more aspects of the disclosed methods.

BRIEF DESCRIPTION OF THE DRAWINGS

So that the manner in which the above recited features, advantages, andobjects of the present invention are attained and can be understood indetail, a more particular description, briefly summarized above, may behad by reference to the embodiments illustrated in the appendeddrawings.

It is to be noted, however, that the appended drawings illustrate onlytypical embodiments of this invention and are therefore not to beconsidered limiting of its scope, for the disclosure may admit to otherequally effective embodiments.

FIG. 1 illustrates components of a video analysis system, according toone embodiment.

FIG. 2 further illustrates components of the video analysis system shownin FIG. 1, according to one embodiment.

FIG. 3 illustrates an example of an alert database in a client device,according to one embodiment.

FIG. 4 illustrates a method for publishing alerts in a behavioralrecognition system configured with alert directives and focused alertdirectives, according to one embodiment.

FIG. 5 illustrates an example graphical representation of a set oftolerances applied to a trajectory alert, according to one embodiment.

FIG. 6 illustrates an example graphical representation of an alertdirective and a focused alert directive applied to a particular alert,according to one embodiment.

DETAILED DESCRIPTION

Embodiments of the invention disclosed herein provide techniques forcreating alert directives and focused alert directives in a behavioralrecognition-based video surveillance system. That is, the disclosedtechniques allow a user of a behavioral recognition system to identifypreviously alerted behavior that should always or never result in asubsequent alert. Because alert directives override only the behavioralrecognition system's normal alert publication procedures (which takeplace after the system has already performed its learning procedures),this approach does not disrupt the behavioral recognition system'sunsupervised learning.

In one embodiment, a behavioral recognition system includes a computervision engine and a machine learning engine. The computer vision enginemay be configured to process a field of view captured within a videostream. This field of view is generally referred to as the “scene.” Inprocessing, the computer vision engine separates foreground objects(e.g., objects resembling people, vehicles, etc.) from backgroundobjects (e.g., objects resembling pavement, the sky, etc.). Afterprocessing the scene, the computer vision engine may generateinformation streams of observed activity (e.g., appearance features,kinematic features, etc.) and pass the streams to the machine learningengine. In turn, the machine learning engine may be configured to learnobject behaviors in the scene using that information. In addition tolearning-based behavior, a machine learning engine may be configured tobuild models of certain behaviors within a scene and determine whetherobservations indicate that the behavior of an object is anomalous,relative to the model. Upon detecting anomalous behavior, the machinelearning engine generates an alert. After determining that the alertshould be published, the behavioral recognition system publishes thealert to a user interface. The user interface may contain a database ofpreviously issued alerts that are generally accessible to a user of thesystem. The user can view these alerts as a list, where each list itemdisplays information of the alert and may include corresponding video orimage data.

After publishing a sufficient number of alerts for a particularbehavioral event, the machine learning engine learns the event is anon-anomalous occurrence and ceases to publish subsequent alerts for theevent. In one embodiment, a user may create an alert directive tooverride the normal alert publication process. An alert directive allowsa user to provide feedback to the machine learning engine to eitheralways or never create an alert for a certain behavioral event. Themachine learning engine consults a list of alert directive definitionsafter learning information streams relating to an event and beforeevaluating the event for anomalous behavior. Thus, alert directives donot hinder the machine learning engine's learning procedures.

To create an alert directive, a user selects an event occurrence or analert previously generated by the system to use as a template. Forexample, a user may parse through a database of previous alerts for ascene, characterized based on time, type, name, event, or otherwise, aswell as view underlying video of the activity that caused the alert. Inone embodiment, a user may do this via a dialog box in an alert browseron the user interface. After selecting an alert, the user defines alertdirective matching criteria. In one embodiment, the criteria may includewhether the behavior should always or never result in an alert, howfrequently the alert should be published (e.g., in situations where thebehavior results in numerous alerts within a short time span), andwhether the machine learning engine should match behaviors or objecttypes (or both). Once the user has defined the matching criteria, theuser interface creates an alert directive in the alert database withreferences pointing back to the original alert used to create it.Thereafter, the user interface sends information about the alertdirective to the machine learning engine.

Note that matching an alert directive to alert behavior by the machinelearning engine may depend on both the alert type and series ofparameters specified by a user. For example, once a user selects analert to use for an alert directive, the corresponding video of thatalert may show a person in front of the security door, along with abounding box indicating the pixels classified by the system as depictingthat person. In such a case, a graphical editor may allow a user toadjust the bounding box around the person in the selected alert toadjust the tolerances for the alert directive—creating a range, e.g.,for the center (x,y) position of a foreground object, etc. Similarly, auser may specify another bounding box for the relative position of theperson in front of the door, i.e., a tolerance for object position. Byadjusting a tolerance size and pose of a person and for the position ofsuch a person, an alert directive may be used to specify a region infront of the security door, so that whenever any person is observed tobe present, the machine learning engine creates an alert. Thus, thisapproach allows for variation in height, position, width, speed, etc.,of observed objects to still satisfy the alert directive definitions.

In a further embodiment, an alert directive may be expanded to provide afocused alert directive. This approach extends the cases where themachine learning engine can apply an alert directive to behavioralevents in the scene. For example, a camera may focus on a buildingcorridor with multiple security doors. In such a case, a user mustcreate and tune a separate alert directive for each door to be alertedwhenever someone appears in front of a door. As an alternative, afocused alert directive allows the user to create both an alertdirective (e.g., for a person appearing in front of the security door)and a focus mask specifying different regions in the scene which shouldresult in an alert.

That is, rather than specifying a tolerance of a position of a person infront of a security door, the user can extend the tolerance in positionto the full field of view. The user defines one or more regions of thescene where an alert should be generated when a foreground objectotherwise within the tolerances of the alert directive is observed. Forexample, the alert of the person appearing in front of the first door(within the alert directive tolerances for height, width, and pose)defines an alert directive, but the position is extended to be theentire field of view of the camera, intersected with the user-definedregions. So, to create a focused alert directive in the given example,where a camera is focused on an area with multiple security doors, theuser would select a “person appears” alert for a person in front of anyone of the doors, specify tolerances around the appearance of thatperson using a graphical editor, and then create a mask for a positionof “person appears” alert to include the regions generally in front ofeach door.

Once a user creates an alert directive (or focused alert directive), theuser interface sends information of the alert directive (and focus mask,if applicable) to the machine learning engine. When the machine learningengine processes information of subsequent events that matches an alertdirective's match criteria and tolerances, the machine learning enginebypasses the normal publication methods of the behavioral recognitionsystem and immediately publishes an alert or discards the event (giventhe matching criteria), irrespective of whether the machine learningengine regards the observed behavior as anomalous. This approach doesnot change the learned state regarding a particular scene or influencethe undirected learning of the machine learning engine. In all cases,the machine learning engine has already performed its learningprocedures before applying the alert directive.

In the following, reference is made to embodiments of the disclosure.However, it should be understood that the disclosure is not limited toany specifically described embodiment. Instead, any combination of thefollowing features and elements, whether related to differentembodiments or not, is contemplated to implement and practice what isdisclosed. Furthermore, in various embodiments the present inventionprovides numerous advantages over the prior art. However, althoughembodiments may achieve advantages over other possible solutions and/orover the prior art, whether or not a particular advantage is achieved bya given embodiment is not limiting. Thus, the following aspects,features, embodiments and advantages are merely illustrative and are notconsidered elements or limitations of the appended claims except whereexplicitly recited in a claim(s). Likewise, any reference to “theinvention” or “the disclosure” shall not be construed as ageneralization of any inventive subject matter disclosed herein andshall not be considered to be an element or limitation of the appendedclaims except where explicitly recited in a claim(s).

One embodiment of the present invention is implemented as a programproduct for use with a computer system. The program(s) of the programproduct defines functions of the embodiments (including the methodsdescribed herein) and can be contained on a variety of computer-readablestorage media. Examples of computer-readable storage media include (i)non-writable storage media (e.g., read-only memory devices within acomputer such as CD-ROM or DVD-ROM disks readable by an optical mediadrive) on which information is permanently stored; (ii) writable storagemedia (e.g., a hard-disk drive) on which alterable information isstored. Such computer-readable storage media, when carryingcomputer-readable instructions that direct the functions of theinvention, are embodiments of the invention. Other examples mediainclude communications media through which information is conveyed to acomputer, such as through a computer or telephone network, includingwireless communications networks.

In general, the routines executed to implement the embodiments of theinvention may be part of an operating system or a specific application,component, program, module, object, or sequence of instructions. Thecomputer program of the invention is comprised typically of a multitudeof instructions that will be translated by the native computer into amachine-readable format and hence executable instructions. Also,programs are comprised of variables and data structures that eitherreside locally to the program or are found in memory or on storagedevices. In addition, various programs described herein may beidentified based upon the application for which they are implemented ina specific embodiment of the disclosure. However, it should beappreciated that any particular program nomenclature that follows isused merely for convenience, and thus the present disclosure should notbe limited to use solely in any specific application identified and/orimplied by such nomenclature.

FIG. 1 illustrates components of a video analysis and behavioralrecognition system 100, according to one embodiment. As shown, thebehavioral recognition system 100 includes a video input source 105, anetwork 110, a computer system 115, and input and output devices 118(e.g., a monitor, a keyboard, a mouse, a printer, and the like). Thenetwork 110 may transmit video data recorded by the video input 105 tothe computer system 115. Illustratively, the computer system 115includes a CPU 120, storage 125 (e.g., a disk drive, optical disk drive,floppy disk drive, and the like), and a memory 130 containing both acomputer vision engine 135 and a machine learning engine 140. Asdescribed in greater detail below, the computer vision engine 135 andthe machine learning engine 140 may provide software applicationsconfigured to analyze a sequence of video frames provided by the videoinput 105.

Network 110 receives video data (e.g., video stream(s), video images, orthe like) from the video input source 105. The video input source 105may be a video camera, a VCR, DVR, DVD, computer, web-cam device, or thelike. For example, the video input source 105 may be a stationary videocamera aimed at a certain area (e.g., a subway station, a parking lot, abuilding entry/exit, etc.), which records the events taking placetherein. Generally, the area within the camera's field of view isreferred to as the scene. The video input source 105 may be configuredto record the scene as a sequence of individual video frames at aspecified frame-rate (e.g., 24 frames per second), where each frameincludes a fixed number of pixels (e.g., 320×240). Each pixel of eachframe may specify a color value (e.g., an RGB value) or grayscale value(e.g., a radiance value between 0-255). Further, the video stream may beformatted using known such formats e.g., MPEG2, MJPEG, MPEG4, H.263,H.264, and the like.

As noted above, the computer vision engine 135 may be configured toanalyze this raw information to identify active objects in the videostream, identify a variety of appearance and kinematic features used bya machine learning engine 140 to derive object classifications, derive avariety of metadata regarding the actions and interactions of suchobjects, and supply this information to the machine learning engine 140.And in turn, the machine learning engine 140 may be configured toevaluate, observe, learn and remember details regarding events (andtypes of events) that transpire within the scene over time.

In one embodiment, the machine learning engine 140 receives the videoframes and the data generated by the computer vision engine 135. Themachine learning engine 140 may be configured to analyze the receiveddata, cluster objects having similar visual and/or kinematic features,build semantic representations of events depicted in the video frames.The machine learning engine 140 learns expected patterns of behavior forobjects that map to a given cluster. Thus, over time, the machinelearning engine learns from these observed patterns to identify normaland/or abnormal events. That is, rather than having patterns, objects,object types, or activities defined in advance, the machine learningengine 140 builds its own model of what different object types have beenobserved (e.g., based on clusters of kinematic and or appearancefeatures) as well as a model of expected behavior for a given objecttype. Thereafter, the machine learning engine can decide whether thebehavior of an observed event is anomalous or not based on priorlearning.

Data describing whether a normal/abnormal behavior/event has beendetermined and/or what such behavior/event is may be provided to outputdevices 118 to issue alerts, for example, an alert message withcorresponding video and image data presented on a GUI interface screen.Such output devices may also be configured with a database of previouslyissued alerts from which a user can create an alert directive.

In general, the computer vision engine 135 and the machine learningengine 140 both process video data in real-time. However, time scalesfor processing information by the computer vision engine 135 and themachine learning engine 140 may differ. For example, in one embodiment,the computer vision engine 135 processes the received video dataframe-by-frame, while the machine learning engine 140 processes dataevery N-frames. In other words, while the computer vision engine 135 mayanalyze each frame in real-time to derive a set of kinematic andappearance data related to objects observed in the frame, the machinelearning engine 140 is not constrained by the real-time frame rate ofthe video input.

Note, however, FIG. 1 illustrates merely one possible arrangement of thebehavior-recognition system 100. For example, although the video inputsource 105 is shown connected to the computer system 115 via the network110, the network 110 is not always present or needed (e.g., the videoinput source 105 may be directly connected to the computer system 115).Further, various components and modules of the behavior-recognitionsystem 100 may be implemented in other systems. For example, in oneembodiment, the computer vision engine 135 may be implemented as a partof a video input device (e.g., as a firmware component wired directlyinto a video camera). In such a case, the output of the video camera maybe provided to the machine learning engine 140 for analysis. Similarly,the output from the computer vision engine 135 and machine learningengine 140 may be supplied over computer network 110 to other computersystems. For example, the computer vision engine 135 and machinelearning engine 140 may be installed on a server system and configuredto process video from multiple input sources (i.e., from multiplecameras). In such a case, a client application 250 running on anothercomputer system may request (or receive) the results of over network110.

FIG. 2 further illustrates components of the computer vision engine 135and the machine learning engine 140 first illustrated in FIG. 1,according to one embodiment of the invention. As shown, the computervision engine 135 includes a data ingestor 205, a detector 215, atracker 215, a context event generator 220, an alert generator 225, andan event bus 230. Collectively, the components 205, 210, 215, and 220provide a pipeline for processing an incoming sequence of video framessupplied by the video input source 105 (indicated by the solid arrowslinking the components). In one embodiment, the components 210, 215, and220 may each provide a software module configured to provide thefunctions described herein. Of course, one of ordinary skill in the artwill recognize that the components 205, 210, 215, and 220 may becombined (or further subdivided) to suit the needs of a particular caseand further that additional components may be added (or some may beremoved) from a video surveillance system.

In one embodiment, the data ingestor 205 receives video input from thevideo input source 105. The data ingestor 205 may be configured topreprocess the input data before sending it to the detector 210. Thedetector 210 may be configured to separate each frame of video providedinto a stationary or static part (the scene background) and a collectionof volatile parts (the scene foreground). The frame itself may include atwo-dimensional array of pixel values for multiple channels (e.g., RGBchannels for color video or grayscale channel or radiance channel forblack and white video). In one embodiment, the detector 210 may modelbackground states for each pixel using an adaptive resonance theory(ART) network. That is, each pixel may be classified as depicting sceneforeground or scene background using an ART network modeling a givenpixel. Of course, other approaches to distinguish between sceneforeground and background may be used.

Additionally, the detector 210 may be configured to generate a mask usedto identify which pixels of the scene are classified as depictingforeground and, conversely, which pixels are classified as depictingscene background. The detector 210 then identifies regions of the scenethat contain a portion of scene foreground (referred to as a foreground“blob” or “patch”) and supplies this information to subsequent stages ofthe pipeline. Additionally, pixels classified as depicting scenebackground may be used to generate a background image modeling thescene.

In one embodiment, the detector 210 may be configured to detect the flowof a scene. Once the foreground patches have been separated, thedetector 210 examines, from frame-to-frame, any edges and corners of allforeground patches. The detector 210 will identify foreground patchesmoving in a similar flow of motion as most likely belonging to a singleobject or a single association of motions and send this information tothe tracker 215.

The tracker 215 may receive the foreground patches produced by thedetector 210 and generate computational models for the patches. Thetracker 215 may be configured to use this information, and eachsuccessive frame of raw-video, to attempt to track the motion of anobject depicted by a given foreground patch as it moves about the scene.That is, the tracker 215 provides continuity to other elements of thesystem by tracking a given object from frame-to-frame. It furthercalculates a variety of kinematic and/or appearance features of aforeground object, e.g., size, height, width, and area (in pixels),reflectivity, shininess rigidity, speed velocity, etc.

The context event generator 220 may receive the output from other stagesof the pipeline. Using this information, the context processor 220 maybe configured to generate a stream of context events regarding objectstracked (by tracker component 210). For example, the context eventgenerator 220 may package a stream of micro feature vectors andkinematic observations of an object and output this to the machinelearning engine 140, e.g., a rate of 5 Hz. In one embodiment, thecontext events are packaged as a trajectory. As used herein, atrajectory generally refers to a vector packaging the kinematic data ofa particular foreground object in successive frames or samples. Eachelement in the trajectory represents the kinematic data captured forthat object at a particular point in time. Typically, a completetrajectory includes the kinematic data obtained when an object is firstobserved in a frame of video along with each successive observation ofthat object up to when it leaves the scene (or becomes stationary to thepoint of dissolving into the frame background). Accordingly, assumingcomputer vision engine 135 is operating at a rate of 5 Hz, a trajectoryfor an object is updated every 200 milliseconds, until complete. Thecontext event generator 220 may also calculate and package theappearance data of every tracked object by evaluating the object forvarious appearance attributes such as shape, width, and other physicalfeatures and assigning each attribute a numerical score.

The computer vision engine 135 may take the output from the components205, 210, 215, and 220 describing the motions and actions of the trackedobjects in the scene and supply this information to the machine learningengine 140 through the event bus 230. Illustratively, the machinelearning engine 140 includes a classifier 235, a semantic module 240, amapper 245, cognitive module 250, and a normalization module 265.

The classifier 235 receives context events such as kinematic data andappearance data from the computer vision engine 135 and maps the data ona neural network. In one embodiment, the neural network is a combinationof a self-organizing map (SOM) and an ART network, shown in FIG. 2 as aSOM-ART classifier 236. The data is clustered and combined by featuresoccurring repeatedly in association with each other. Then, based onthose recurring types, the classifier 235 defines types of objects. Forexample, the classifier 235 may define foreground patches that have, forexample, a high shininess rigidity and reflectivity as a Type 1 object.These defined types then propagate throughout the rest of the system.

The mapper 240 may use these types by searching for spatial and temporalcorrelations and behaviors across the system for patches to create mapsof where and when events are likely or unlikely to happen. In oneembodiment, the mapper 240 includes a temporal memory ART network 241, aspatial memory ART network 242, and statistical engines 243. Forexample, the mapper 240 may look for patches of Type 1 objects. Thespatial memory ART network 242 uses the statistical engines 243 tocreate statistical data of these objects, such as where in the scene dothese patches appear, in what direction do these patches tend to go, howfast do these patches go, whether these patches change direction, andthe like. The mapper 240 then builds a neural network of thisinformation, which becomes a memory template against which to compareobject behaviors. The temporal memory ART network 241 uses thestatistical engines 243 to create statistical data based on samplings oftime slices. In one embodiment, initial sampling occurs at every thirtyminute interval. If many events occur within a time slice, then the timeresolution may be dynamically changed to a finer resolution. Conversely,if fewer events occur within a time slice, then the time resolution maybe dynamically changed to a coarser resolution.

In one embodiment, the semantic module 245 includes a phase spacepartitioning component 246. The semantic module 245 identifies patternsof motion or trajectories within a scene and analyzes the scene foranomalous behavior through generalization. By tessellating a scene anddividing the foreground patches into many different tessera, thesemantic module 245 can traces an object's trajectory and learnspatterns from the trajectory. The semantic module 245 analyzes thesepatterns and compares them with other patterns. As objects enter ascene, the phase space partitioning component 246 builds an adaptivegrid and maps the objects and their trajectories onto the grid. As morefeatures and trajectories are populated onto the grid, the machinelearning engine learns trajectories that are common to the scene andfurther distinguishes normal behavior from anomalous behavior.

In one embodiment, the cognitive module 250 includes a perceptual memory251, an episode memory 252, a long term memory 253, a workspace 254, andcodelets 255. Generally, the workspace 254 provides a computationalengine for the machine learning engine 140. For example, the workspace240 may be configured to copy information from the perceptual memory251, retrieve relevant memories from the episodic memory 252 and thelong-term memory 253, select which codelets 255 to execute. In oneembodiment, each codelet 255 is a software program configured toevaluate different sequences of events and to determine how one sequencemay follow (or otherwise relate to) another (e.g., a finite statemachine). More generally, the codelet may provide a software moduleconfigured to detect interesting patterns from the streams of data fedto the machine learning engine. In turn, the codelet 255 may create,retrieve, reinforce, or modify memories in the episodic memory 252 andthe long-term memory 253. By repeatedly scheduling codelets 255 forexecution, copying memories and percepts to/from the workspace 240, themachine learning engine 140 performs a cognitive cycle used to observe,and learn, about patterns of behavior that occur within the scene.

In one embodiment, the perceptual memory 251, the episodic memory 252,and the long-term memory 253 are used to identify patterns of behavior,evaluate events that transpire in the scene, and encode and storeobservations. Generally, the perceptual memory 251 receives the outputof the computer vision engine 135 (e.g., a stream of context events).The episodic memory 252 stores data representing observed events withdetails related to a particular episode, e.g., information describingtime and space details related on an event. That is, the episodic memory252 may encode specific details of a particular event, i.e., “what andwhere” something occurred within a scene, such as a particular vehicle(car A) moved to a location believed to be a parking space (parkingspace 5) at 9:43 AM.

In contrast, the long-term memory 253 may store data generalizing eventsobserved in the scene. To continue with the example of a vehicleparking, the long-term memory 253 may encode information capturingobservations and generalizations learned by an analysis of the behaviorof objects in the scene such as “vehicles tend to park in a particularplace in the scene,” “when parking vehicles tend to move a certainspeed,” and “after a vehicle parks, people tend to appear in the sceneproximate to the vehicle,” etc. Thus, the long-term memory 253 storesobservations about what happens within a scene with much of theparticular episodic details stripped away. In this way, when a new eventoccurs, memories from the episodic memory 252 and the long-term memory253 may be used to relate and understand a current event, i.e., the newevent may be compared with past experience, leading to bothreinforcement, decay, and adjustments to the information stored in thelong-term memory 253, over time. In a particular embodiment, thelong-term memory 253 may be implemented as an ART network and asparse-distributed memory data structure. Importantly, however, thisapproach does not require the different object type classifications tobe defined in advance.

In one embodiment, modules 235, 240, 245, and 250 include an anomalydetection component, as depicted by components 237, 244, 247, and 256.Each anomaly detection component is configured to identify anomalousbehavior, relative to past observations of the scene. Further, eachcomponent is configured to receive alert directive and focus maskinformation from alert database 270. Generally, if any anomaly detectioncomponent identifies anomalous behavior, the component generates analert and passes the alert through the normalization module 265. Forinstance, anomaly detector 247 in the semantic module 245 detectsunusual trajectories using learned patterns and models. If a foregroundobject exhibits loitering behavior, for example, anomaly detectioncomponent 247 evaluates the object trajectory using loitering models,subsequently generates an alert, and sends the alert to thenormalization module 265. Upon receiving an alert, the normalizationmodule 265 evaluates whether the alert should be published based on thealert's rarity relative to previous alerts of that alert type. Once thenormalization module 265 determines that the alert should be published,it passes the alert to the alert generator 225 (through event bus 230).

However, if an anomaly detection component identifies an event thatmatches an alert directive, then rather than evaluating the event foranomalous behavior, the anomaly detector component instead follows thematch criteria of the alert directive. If the alert directive requiresthat an alert be published, the anomaly detection component sends analert to the alert generator 225 (through event bus 230). Otherwise, theanomaly detection component discards the event. Note that in eithercase, the anomaly detection component does not send any information tothe normalization module 265 if the event data matches an alertdirective.

In one embodiment, the alert generator 225 resides in the computervision engine 135. The alert generator 225 receives alert informationfrom the anomaly detection components 237, 244, 247, and 256 and thenormalization module 265. The alert generator 225 publishes alertinformation to the GUI/client device 260. The GUI/client device storesthis alert information in the alert database 270. The alert database 270contains previously issued alerts and may be accessible to a user of theGUI/client device 270.

FIG. 3 illustrates an example of an alert database 300 in a clientdevice, according to one embodiment. The alert database 300 storespreviously issued alerts that a user may parse through to create analert directive. As shown, the alert database 300 includes a pluralityof alerts and an alert directive list 305. Each alert 310 includes anidentifier 311, a directive identifier 312, and a summary 313. Theidentifier 311 is a unique numerical value assigned to the alert 310.The directive identifier 312 is a numerical field that indicates whetherthe alert 310 has been assigned an alert directive.

The summary 313 is a data-payload that contains a concise description ofthe data characterizing the alert. The summary 313 may includeinformation about the type of anomaly, what time the anomaly occurred,height and width values and an x- and y-coordinate of an object (if theanomaly occurred at a point in time), a set of x- and y-coordinatescorresponding to a trajectory (if the anomaly occurred over a series offrames), and the like. Alert directives evaluate object behaviors orobject types (or both) that match the information provided in thesummary 313.

The alert directive list 305 includes a plurality of alert directives.Each alert directive 320 has an identifier 321, an alert pointer 322,match criteria 323, and an epilog 324. The identifier 321 of the alertdirective is a unique numerical value assigned to an alert directive.Alert pointer 322 is a pointer to the original alert to which the alertdirective corresponds. By pointing to the original alert, the alertdirective 320 can access the data provided by summary 313. In oneembodiment, the information contained in summary 313 may be stored as adata packet in a corresponding alert directive 320.

Match criteria 323 contains user-specified information of how the alertdirective should process a certain event, such as whether the machinelearning engine should publish an alert or discard the behavior, andwhether to match an alert directive to a behavior or to an object type(or both). For example, if a user chooses to disregard matching behaviorfor an “unusual location” alert, the machine learning engine may createalerts for an object at rest at the location specified by the alertdirective, and it may create alerts for an object moving rapidly throughthe same location. As another example, if a user chooses to disregardtypes in matching for an “unusual location” alert, the machine learningengine may create alerts for a object corresponding to a learning basedclassification type 1 (e.g., a car) positioned at the location, and themachine learning engine may also create alerts for an objectcorresponding to a learning based classification type 2 (e.g., a person)positioned at the location.

The epilog 324 is an array of tolerance values of each correspondingalert characteristic in the data provided by summary 313. Tolerancesprovide the machine learning engine with flexibility in matching objectbehaviors and types to an alert directive, as the likelihood of matchingtwo objects having the same characteristics (height, width and thecenter (x,y) position) in a scene is very low. In one embodiment, a userdefines these tolerances by using a graphical editor on a selectedalert. By drawing a bounding box around the object that triggered thealert, the user can adjust the tolerances for the alert directive,creating a range for several characteristics of the selected alert(e.g., for the heights and widths of the object).

FIG. 4 is a method 400 for publishing alerts in a behavioral recognitionsystem configured with alert directives, according to one embodiment.The method 400 begins at step 405, where the machine learning engineloads an alert directives list (and a focus mask, if applicable). In oneembodiment, the machine learning engine loads the alert directives listat system startup. Additionally, when a user creates an alert directiveafter startup has occurred, the user interface sends information of thealert directive to the machine learning engine. At step 410, the machinelearning engine processes a behavioral event. For instance, the machinelearning engine may process information generated by the computer visionengine corresponding to a person standing at a point in the scene. Bythis point, the machine engine has completed its learning procedures. Atstep 415, the machine learning engine searches the alert directives listto determine whether the behavior corresponds to an alert directivebased on matching criteria. If there is a matching alert directive (step425), then the machine learning engine bypasses the normal publicationprocess and publishes an alert to the user interface. In the ongoingexample, the alert directives list may include a directive to alwaysissue an “unusual location” alert for any person (i.e., an object modelcorresponding to a person) standing in certain position of a scene,given tolerances for height, width, and the person's central (x,y)position. If the observed person's height and width and locationcoordinates match with the alert directive, the behavioral recognitionsystem immediately publishes an “unusual location” alert. However, ifthere is no matching alert directive (step 430), the machine learningengine proceeds with the normal publication process and evaluates theevent for anomalous behavior.

FIG. 5 is an example graphical representation of a set of tolerancesapplied to a trajectory alert, according to one embodiment. Because atrajectory takes place over a series of video frames, the machinelearning engine matches trajectory-based events to an alert directivedifferently from behavioral events that happen at a point in the scene,and thus tolerances are also created differently. The originaltrajectory 515 represents a trajectory that resulted in an alert. Asshown, the original trajectory 515 includes a starting point 505 and anending point 510, with a distance 525 between the two points. Inaddition to these components, an alert directive for the originaltrajectory 515 includes a set of coordinates corresponding to the path.In one embodiment, a user may, through a graphical interface, assigntolerances to the trajectory so that future occurrences of thetrajectory are not required to strictly adhere to the coordinates oforiginal trajectory 515. By creating bounding boxes around both startingpoint 505 and ending point 510, the user specifies a tolerance region(represented by the region enclosed by dotted lines) for a trajectory tooccur to trigger the alert directive. Thus, an object traveling on analternate trajectory 520 triggers the alert directive because thetrajectory is within the set of tolerances (shown by being within theregion enclosed by the dotted lines).

FIG. 6 is an example graphical representation of an alert directive anda focused alert directive applied to a particular alert within a scene,according to one embodiment. In this example, a behavioral recognitionsystem is focused on a train platform. Images 605, 610, 615, and 620 allrepresent an image of the same alert provided to a user. Image 605represents the original alert, with a bounding box 606 around a person(i.e. pixels classified by the machine learning engine as a person) whotriggered the alert. For the purposes of this example, assume that thealert is an “unusual location” alert. In the behavioral recognitionsystem, this alert data may include height and width pixel values of theobject as well as the object's center (x,y) position. Image 610represents a user creating an alert directive by drawing a widerbounding box 607 around the original alert. By creating a wider boundingbox 607, the user sets larger tolerances for the machine learning engineto match when processing similar occurrences within that area. Thus, aperson appearing in the shaded part of the scene depicted in the widerbounding box 607 triggers an alert directive for an “unusual location”alert.

The user may want to the same “unusual location” alert directive toapply to objects appearing within the area of the scene corresponding torailroad tracks. Accordingly, the user may create a focused alertdirective to accomplish this. Images 615 and 620 represent a usercreating a focused alert directive for the “unusual location” alertdepicted in image 605. To create a focused alert directive from anexisting alert, a user first creates a bounding box 616 over a portionwhere the user would like to apply a focus mask. Within that boundingbox, a user can select a region (or regions), and a focus mask resultsfrom the intersection of the bounding box and the selected region(s).Thereafter, if a person wanders onto the railroad tracks in the scene,the machine learning engine processes this behavior using the focusedalert directive and publishes an alert.

As described, embodiments of the present invention provide techniques ofconfiguring a behavioral recognition system to generate an alert. Morespecifically, by creating alert directives (or focused alert directives)for a machine learning engine to follow, certain events always or neverresult in an alert. Advantageously, this approach does not impede theunsupervised learning process of the behavioral recognition systembecause when a behavioral event triggers an alert directive, the machinelearning engine has already completed its learning process.

While the foregoing is directed to embodiments of the presentdisclosure, other and further embodiments of the disclosure may bedevised without departing from the basic scope thereof, and the scopethereof is determined by the claims that follow.

1. A method, comprising: obtaining characteristic values for an observedevent in a scene, parsing a plurality of alert directives to identify amatching alert directive from the plurality of alert directives based onthe characteristic values corresponding to a range of criteria values ofthe matching alert directive, the matching alert directive overriding analert publication procedure of a behavioral recognition system, and uponidentifying the matching alert directive, one of publishing or notpublishing an alert, based on the overriding.
 2. The method of claim 1,wherein the characteristic values include at least one of: apixel-height value, a pixel-width value, and an x- and y-coordinatecenter position of a foreground object.
 3. The method of claim 1,wherein the characteristic values include a set of x-coordinates andy-coordinates corresponding to a foreground object trajectory.
 4. Themethod of claim 1, wherein the matching alert directive includes a focusmask that intersects with a region in the scene where the observed eventoccurred.
 5. The method of claim 1, wherein the range of criteria valuesis adjustable via a graphical editor.
 6. (canceled)
 7. The method ofclaim 1, wherein the parsing the plurality of alert directives includesmatching an object type and a behavior.
 8. A computer-readable storagemedium storing instructions that when executed on a processor, performan operation comprising: obtaining characteristic values for an observedevent in a scene, parsing a plurality of alert directives to identify amatching alert directive from the plurality of alert directives based onthe characteristic values corresponding to a range of criteria values ofthe matching alert directive, the matching alert directive overriding analert publication procedure of a behavioral recognition system, and uponidentifying the matching alert directive, one of publishing or notpublishing an alert, based on the overriding.
 9. The computer-readablestorage medium of claim 8, wherein the characteristic values include atleast one of: a pixel-height value, a pixel-width value, and an x- andy-coordinate center position of a foreground object.
 10. Thecomputer-readable storage medium of claim 8, wherein the characteristicvalues include a set of x-coordinates and y-coordinates corresponding toa foreground object trajectory.
 11. The computer-readable storage mediumof claim 8, wherein the matching alert directive includes a focus maskthat intersects with a region in the scene where the observed eventoccurred.
 12. The computer-readable storage medium of claim 8, whereinthe range of criteria values is adjustable via a graphical editor. 13.(canceled)
 14. The computer-readable storage medium of claim 8, whereinthe parsing the plurality of alert directives includes matching anobject type and a behavior.
 15. A system comprising: a processor and amemory hosting an application, which, when executed on the processor,performs an operation for alerting a user to behavior corresponding toan alert directive, the operation comprising: obtaining characteristicvalues for an observed event in a scene, parsing a plurality of alertdirectives to identify a matching alert directive from the plurality ofalert directives based on the characteristic values corresponding to arange of criteria values of the matching alert directive, the matchingalert directive overriding an alert publication procedure of abehavioral recognition system, and upon identifying the matching alertdirective, one of publishing or not publishing an alert, based on theoverriding.
 16. The system of claim 15, wherein the characteristicvalues include at least one of: a pixel-height value, a pixel-widthvalue, and an x- and y-coordinate center position of a foregroundobject.
 17. The system of claim 15, wherein the characteristic valuesinclude a set of x-coordinates and y-coordinates corresponding to aforeground object trajectory.
 18. The system of claim 15, wherein thematching alert directive includes a focus mask that intersects with aregion in the scene where the observed event occurred.
 19. The system ofclaim 15, wherein the range of criteria values is adjustable via agraphical editor.
 20. (canceled)
 21. The system of claim 15, wherein theparsing the plurality of alert directives includes matching an objecttype and a behavior.